*RCRA Nationwide Hedonic Study
*Coarsened Exact Matching (CEM) Routine
*Created: 2/9/2021
*Created by: Dennis Guignet
*Last Revised: 2/22/2022
*Last Revised by: Dennis Guignet

********************************************************************************

*This do-file takes the completed transaction dataset of all transactions and 
*	prunes and re-weights the dataset so that the treated group (0-750m) and 
*	control group (750-1500m) around a Corrective Action (CA) are more similar. 
*	A new dataset is saved with the weights for purposes of re-estimating the 
*	hedonic regression models. The second half of the do-file performs some 
*	L1 statistic tests for imbalance, and thus generates Tables A13 and A14 
*	in Appendix F.1. 


********************************************************************************
********************************************************************************



*set empty cells for factor variables to drop
set emptycells drop
clear all
*increase max variables allowed b/c factor variables
set maxvar 100000


*bring in estimating dataset with only sales w/in 5k of CA
use "$salesfolder\All_Sales_Final_Cleaned_CA1500m", clear
count
*create treatment dummy for matching
gen dCA0_750=0
replace dCA0_750=1 if (dpreCA0_750+dmidCA0_750+dpostCA0_750)>0
tab dCA0_750

*derive some descriptive stats to get sense of distributions and inform coarsened
*	values for "cem" command.
sum age if age_miss==0, detail
sum p_nbdev_2011_500, detail
sum acres if acres_miss==0, detail
sum sqftstrc if sqftstrc_miss==0, detail
sum bathtot if bathtot_miss==0, detail

	
*Run matching weights algorithm
cem tranyr (#0) mycntyid (#0) bathtot (0 1 2 3) bathtot_miss (#0) ///
	age (0 20 70) age_miss (#0) p_nbdev_2011_500 (30 70) sqftstrc (0 1700 3800) ///
	sqftstrc_miss (#0) acres (0 0.125 0.25) acres_miss (#0), treatment(dCA0_750)
	*Note: Cutoffs chosen based on intuition and distribution in unmatched 
	*	sample. More specifically, all continuous variables have bins chosen 
	*	for zero/missing category and then roughly the 25th and 75th percentiles. 
	
*A few quick checks on weights	
tab dCA0_750 cem_matched
sum cem_weights if cem_matched==0
sum cem_weights if cem_matched==1
sum cem_weights if cem_matched==1 & dCA0_750==1
sum cem_weights if cem_matched==1 & dCA0_750==0
sum cem_weights if cem_matched==1 & dCA0_750==0, detail	
	
	
*save CEM weighted dataset
save "$salesfolder\All_Sales_Final_Cleaned_CA1500m_CEM", replace

*keep only matched and saved
drop if cem_matched==0
count
save "$salesfolder\All_Sales_Final_Cleaned_CA1500m_CEM_MatchOnly", replace

********************************************************************************


*stats to check imbalance (Tables A13 and A14 in Appendix F.1)
use "$salesfolder\All_Sales_Final_Cleaned_CA1500m_CEM", clear

*Before CEM weights
imb /*tranyr mycntyid*/ acres acres_miss stories stories_miss bathtot bathtot_miss ///
	sqftstrc sqftstrc_miss age age_miss p_nbdev_2011_200 p_nbdev_2011_500 hwy500m ///
	cntTSD0_5000, treatment(dCA0_750) 
*After CEM weights
imb /*tranyr mycntyid*/ acres acres_miss stories stories_miss bathtot bathtot_miss ///
	sqftstrc sqftstrc_miss age age_miss p_nbdev_2011_200 p_nbdev_2011_500 hwy500m ///
	cntTSD0_5000, treatment(dCA0_750) useweights

*END
	
	
	





